23 research outputs found

    Geo-located Twitter as the proxy for global mobility patterns

    Full text link
    In the advent of a pervasive presence of location sharing services researchers gained an unprecedented access to the direct records of human activity in space and time. This paper analyses geo-located Twitter messages in order to uncover global patterns of human mobility. Based on a dataset of almost a billion tweets recorded in 2012 we estimate volumes of international travelers in respect to their country of residence. We examine mobility profiles of different nations looking at the characteristics such as mobility rate, radius of gyration, diversity of destinations and a balance of the inflows and outflows. The temporal patterns disclose the universal seasons of increased international mobility and the peculiar national nature of overseen travels. Our analysis of the community structure of the Twitter mobility network, obtained with the iterative network partitioning, reveals spatially cohesive regions that follow the regional division of the world. Finally, we validate our result with the global tourism statistics and mobility models provided by other authors, and argue that Twitter is a viable source to understand and quantify global mobility patterns.Comment: 17 pages, 13 figure

    Scaling of city attractiveness for foreign visitors through big data of human economical and social media activity

    Full text link
    Scientific studies investigating laws and regularities of human behavior are nowadays increasingly relying on the wealth of widely available digital information produced by human social activity. In this paper we leverage big data created by three different aspects of human activity (i.e., bank card transactions, geotagged photographs and tweets) in Spain for quantifying city attractiveness for the foreign visitors. An important finding of this papers is a strong superlinear scaling of city attractiveness with its population size. The observed scaling exponent stays nearly the same for different ways of defining cities and for different data sources, emphasizing the robustness of our finding. Temporal variation of the scaling exponent is also considered in order to reveal seasonal patterns in the attractivenessComment: 8 pages, 3 figures, 1 tabl

    Mining Urban Performance: Scale-Independent Classification of Cities Based on Individual Economic Transactions

    Full text link
    Intensive development of urban systems creates a number of challenges for urban planners and policy makers in order to maintain sustainable growth. Running efficient urban policies requires meaningful urban metrics, which could quantify important urban characteristics including various aspects of an actual human behavior. Since a city size is known to have a major, yet often nonlinear, impact on the human activity, it also becomes important to develop scale-free metrics that capture qualitative city properties, beyond the effects of scale. Recent availability of extensive datasets created by human activity involving digital technologies creates new opportunities in this area. In this paper we propose a novel approach of city scoring and classification based on quantitative scale-free metrics related to economic activity of city residents, as well as domestic and foreign visitors. It is demonstrated on the example of Spain, but the proposed methodology is of a general character. We employ a new source of large-scale ubiquitous data, which consists of anonymized countrywide records of bank card transactions collected by one of the largest Spanish banks. Different aspects of the classification reveal important properties of Spanish cities, which significantly complement the pattern that might be discovered with the official socioeconomic statistics.Comment: 10 pages, 7 figures, to be published in the proceedings of ASE BigDataScience 2014 conferenc

    Collective Prediction of Individual Mobility Traces for Users with Short Data History

    No full text
    <div><p>We present and test a sequential learning algorithm for the prediction of human mobility that leverages large datasets of sequences to improve prediction accuracy, in particular for users with a short and non-repetitive data history such as tourists in a foreign country. The algorithm compensates for the difficulty of predicting the next location when there is limited evidence of past behavior by leveraging the availability of sequences of other users in the same system that provide redundant records of typical behavioral patterns. We test the method on a dataset of 10 million roaming mobile phone users in a European country. The average prediction accuracy is significantly higher than that of individual sequence prediction algorithms, primarily constant order Markov models derived from the user’s own data, that have been shown to achieve high accuracy in previous studies of human mobility. The proposed algorithm is generally applicable to improve any sequential prediction when there is a sufficiently rich and diverse dataset of sequences.</p></div

    Correct/incorrect prediction for given position for three selected sequences.

    No full text
    <p>Prediction accuracy in our setup depends crucially on the availability of good experts in the ensemble. In the three example sequences we see a color-coded depiction of prediction success or failure adjacent to the numbers of awake and best experts, i.e. experts that can provide a prediction at a given step, and those among them which have accumulated the minimum loss up to that step. The three sequences are rather typical examples seen in the test dataset. Low numbers of best and awake experts almost invariably lead to incorrect predictions, and vice versa.</p

    Comparison with the best expert in the ensemble.

    No full text
    <p>The best expert here is declared at the end of the sequence, as the Markov model in the expert ensemble which accumulated the minimum loss during prediction. If more than one experts share this property, a representative is chosen arbitrarily. (A) The EW forecaster’s prediction accuracy compared to the best expert prediction accuracy. The forecaster’s accuracy is superior more often than not, and with larger differences, resulting in a 4% average advantage. (B) The <i>O</i>(1) Markov model constructed sequentially from the user’s own locations as they are recorded in real time is less accurate than the best expert for a large majority of the test sequences. It may appear slightly surprising that another users data is better at predicting a given user’s location sequence, but the user’s own Markov model is constructed sequentially, needing time to learn the patterns, while experts’ Markov models enter the “competition” fully constructed.</p

    Prediction per position and over a hour of a day.

    No full text
    <p>(A) Average prediction accuracy per position <i>n</i> in the sequence, for the EW forecaster and Markov models orders <i>k</i> = 1, 2, 3. The best Markov model is <i>O</i>(1) and is on par with the EW forecaster for the first half-day after the start of the user’s sequence and the prediction process. EW achieves a stable (average) lead after that point. The quasi-periodic pattern is due to the fact that most roamers arrive to the visit country during the day, combined with the fluctuation between day and night prediction accuracies seen in (B). Prediction accuracy is significantly higher in the period between 02:00–08:00 because of the much higher regularity of mobility patterns during these hours.</p

    EW forecaster prediction accuracy.

    No full text
    <p>(A) Percentage of sequences predicted with a certain accuracy (in bins of 10%) for the EW forecaster and Markov models of order <i>k</i> = 1, 2, 3 constructed sequentially from the users own data as the sequence of locations is observed in time. We use a learning rate <i>η</i> = 3. The EW forecaster improves on the performance of the best Markov model, which again turns out to be <i>O</i>(1) [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0170907#pone.0170907.ref027" target="_blank">27</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0170907#pone.0170907.ref032" target="_blank">32</a>], by an average of 5%. A detailed comparison between the two is depicted in (B), the scatterplot of difference in prediction accuracy per sequence. For more than 90% of the test sequences, the EW forecaster is more accurate.</p

    Prediction accuracy dependence on sampling and <i>T</i><sub><i>past</i></sub>.

    No full text
    <p>(A) Average prediction accuracy for particular filterings of the expert ensemble. We randomly sample experts from the ensemble and additionally we filter the experts’ sequence fragments so that only those that end within a time window <i>T</i><sub><i>past</i></sub> are included. Decreasing the sampling rate and/or reducing <i>T</i><sub><i>past</i></sub> decimates the ensemble, and beyond a point it hits the accuracy of the forecaster. (B) The average percentage of distinct transitions <i>X</i><sub><i>n</i>−1</sub> → <i>X</i><sub><i>n</i></sub> in a test sequence that are contained by at least one expert in the ensemble after filtering. Prediction accuracy in (A) starts dropping when the sampling rate is reduced beyond a few percent, showing that the ensemble is very diverse and robust. A very slight drop in performance comes with including all experts, due to the logarithmic search costs of the forecaster when the ensemble grows.</p
    corecore